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Abstract. Many tools and libraries are readily available to build and 
operate distributed Web applications. While the setup of operational en- 
vironments is comparatively easy, practice shows that their continuous 
secure operation is more difficult to achieve, many times resulting in 
vulnerable systems exposed to the Internet. Authenticated vulnerability 
scanners and validation tools represent a means to detect security vul- 
nerabilities caused by missing patches or misconfiguration, but current 
approaches center much around the concepts of hosts and operating sys- 
tems. This paper presents a language and an approach for the declarative 
specification and execution of machine-readable security checks for sets 
of more fine-granular system components depending on each other in 
a distributed environment. Such a language, building on existing stan- 
dards, fosters the creation and sharing of security content among security 
stakeholders. Our approach is exemplified by vulnerabilities of and cor- 
responding checks for Open Source Software commonly used in today's 
Internet applications. 
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1 Introduction 

The importance of security is nowadays well recognized and mechanisms to en- 
force it are being developed and adopted within enterprises. However, this is not 
sufficient to ensure that security requirements are met, as such mechanisms have 
to be correctly configured and maintained at operations time. In fact, a signif- 
icant share of vulnerabilities results from security misconfiguration, as shown 
by data breach reports such as [1], [2] and projects such as the OWASP Top 
10 [3]. The reason is that activities targeting the creation and maintenance of 
a secure setup, such as patch or configuration management, are labor-intense 
and error-prone. Software vendors, for instance, issue an increasing number of 
security advisories, while users, on the other hand, struggle to understand if a 
given vulnerability is exploitable under their particular conditions and requires 
immediate patching. As another example, configuration best-practice provided 
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as prose documentation and supposingly supporting system admininistrators, is 
often very broad and ambiguous. 

Due to such difficulties, configuration validation is needed to gain assurance 
about system security, but again, often requires manual intervention, and thus 
is time-consuming and limited to samples. New trends focus on providing stan- 
dards for security automation, e.g., the Security Content Automation Protocol 
(SCAP, [4]), provided by the National institute of Standard and Technology 
(NIST), whose specifications receive a lot of attention in the scope of the config- 
uration baseline for IT products used in US federal agencies [4]. SCAP comprises 
a language that allows the specification of machine-readable security checks to 
facilitate the detection of vulnerabilities caused by misconfiguration. While this 
represents an important step towards the standardization and exchange of secu- 
rity knowledge, SCAP focus on the granularity of hosts and operating systems, 
and as such cannot be easily applied to fine-granular and distributed system 
components 2 independent from their environment, e.g., a Java Web Application 
(JWA). Furthermore, SCAP does not leverage standards and technologies in the 
area of system and configuration management, in order to, for instance, separate 
check logic and information about configuration retrieval. 

To address these limitations and make the advantages of SCAP available to 
Web security experts, we propose a SCAP-based language and approach for the 
declarative specification and execution of checks for sets of fine-granular com- 
ponents depending on each other in a distributed environment. Moreover we 
separate the check logic from the retrieval of the configuration values for which 
we rely on existing system management procedures and technologies, e.g., Con- 
figuration Management Databases (CMDB) as defined in the IT Infrastructure 
Library (ITIL) . Each check is essentially a set of tests over software component 
properties - such as the release and patch level - and configuration settings that 
determine a system component's behavior. Though this is not a limitation of 
the language, we focus on security checks, i.e., one of the most important usages 
is the detection of security vulnerabilities. As an example, the language allows 
the specification of a check to express that the deployment descriptor of any 
JWA deployed in a Servlet container supporting a Servlet specification version 
of at least 3.0 must have the http-only flag enabled, to prevent the access of 
client-side scripts to session cookies. 

This paper is structured as follows. Sect. 2 introduces a sample system based 
on common Open Source Software (OSS), introduces a set of scenarios for con- 
figuration validation, and derives requirements for a configuration validation 
language. Sect. 3 presents state-of-the-art with regard to the specification of 
security checks for software and configuration vulnerabilities. Sect. 4 presents 
the configuration validation language, while Sect. 5 describes our approach. The 
paper concludes with an outlook on future work in Sect. 6. 

2 A system component hereby represents a single installation of a software compo- 
nent (or product) in a specific system, such as a given deployment of a Java Web 
Application in a Servlet container. 
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2 Use Case and Requirements 
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Fig. 1: ACME landscape 



This section outlines an example landscape composed of a custom application 
on top of common OSS, and herewith prototypic for many real- life systems. An 
overview about network topology and installed software components is shown in 
Fig. 1. The service provider ACME operates this landscape for its application 
service "elnvoice" , which allows customers to manage electronic invoices, and to 
make them available to their business partners through the Internet. The applica- 
tion front-end for managing and accessing invoices is implemented as a JWA. In- 
stances of the application, each dedicated to one customer, are deployed in Tom- 
cat, in customer-specific context roots. Tomcat instances run inside an internal 
subnet, and are proxied by the Apache 
HTTP Server installed on a physical 
machine connected to the DMZ. Re- 
quests for a customer-dedicated sub- 
domain of acme.com are forwarded by the 
reverse-proxy to the respective, customer- 
dedicated instance of the JWA via the 
Apache JServ Protocol (AJP). 

Another machine running in the inter- 
nal network hosts a LDAP server for the 
management of user accounts, as well as 
a MySQL database used for persistency. 

As the system is prototypic, so are the tasks related to configuration man- 
agement and validation. In the following, we will describe different scenarios 
for configuration validation, different in terms of periodicity, urgency (response 
time), validation scope, and authorship of configuration checks. 

Vulnerability Assessment (SI). This scenario focuses on the detection of 
known vulnerabilities. Upon disclosure of a new security vulnerability of off-the- 
shelf applications or software libraries, system administrators need to investigate 
the susceptibility of their system. First, they need to check for the presence of 
affected release and patch levels. This can be difficult in case of software libraries 
embedded into off-the-shelf applications as their presence is often unknown. Sec- 
ond, they need to check whether additional conditions for a successful exploita- 
tion are met. Such conditions often concern specific configuration settings of the 
affected software, as well as the specific usage context and system environment. 
The automation of both activities with help of machine-readable vulnerability 
checks decreases time and effort required to discover a system vulnerability, and 
at the same time increases the precision with which the presence of vulnera- 
bilities can be detected. Precision is important as organizations are typically 
reluctant to apply patches or other measures in a productive environment un- 
less absolutely necessary. Such checks would represent a valuable complement 
to textual descriptions published by security researchers or software vendors in 
vulnerability databases such as the NVD [4]. As an example, CVE-2011-3190 3 
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reports a vulnerability in the AJP connector implementation of several Tomcat 
releases [5], which, however, only applies under certain conditions, e.g., if certain 
connector classes are used, and reverse proxy and Tomcat do not use a shared 
secret. A machine check looking at the Tomcat release level and related con- 
figuration settings could be easily provided by the application vendor (Apache 
Software Foundation) . An example for a critical security bug in a software library 
is CVE-2012-0392 which describes a vulnerability in Apache Struts, a common 
framework to support the Model- View-Controller paradigm in JWAs. The de- 
tection of this vulnerability is made more problematic by the fact that end-users 
typically do not know if applications installed in their environment make use of 
such library, and they cannot rely on the presence of a well-established security 
response process at each of their application vendors. Thus security bugs may 
be dormant in libraries without the service operator being aware. 

Configuration Best-Practice (S2). This scenario focuses on establish- 
ing if best practices are followed. During operations time, system administra- 
tors need to periodically check whether the system configurations follow best- 
practices, for single and distributed system components. Today, these are of- 
ten described in prose and evolve over time thus requiring continuous human 
intervention. Examples of best-practices are the Tomcat security guide from 
OWASP [6], and the SANS recommendations for securing Java deployment de- 
scriptors [7]. 

Example 1 (SANS recommendation on cookie-based session handling). SANS 
recommends to configure the cookie-based session handling for JWAs 
(<cookie-conf ig> section of the deployment descriptor), i.e., (i) preventing 
the access to session cookies (<http-only> set to true), and (ii) transmitting 
cookies securely (<secure> set to true). In particular the http-only flag is an 
example of recommendation that only applies after the release 3.0 of the Servlet 
specification. 

Configuration best-practices may also cover a set of distributed components, e.g., 
the how-to about Apache HTTP server as a reverse proxy for Apache Tomcat [8] . 
A language supporting the specification of such best-practice checks should sup- 
port the flexible adoption to a specific environment. A recommendation related 
to the session timeout, for instance, may be refined by an organization to reflect 
its particular policy. 

Compliance with Configuration Policy (S3). This scenario focuses on 
the periodic validation of landscape specific configuration implementing the de- 
signed policy. Such a configuration includes a set of mandated configuration 
settings that an organization expects to be active in its system. As an exam- 
ple, the configuration that enforce the ACME's access control policy embraces 
configuration settings of several distributed system components, e.g., the realm 
definition of each Tomcat instance, as well as the deployment descriptor of each 
Java application instance. In particular the deployment descriptor has to allow 
the role admin-role to access to the URL path /manager/*. Moreover the realm 
of Tomcat has to refer to the LDAP server located at 192.168.2.1. This example 
illustrates that configuration checks aiming to assess compliance with a given 
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configuration policy strongly reflect a particular system and environment, and 
are therefore authored internally to the organization rather than by externals, 
as in the previous scenarios. 

A language for supporting the above scenarios have to fullfill the following 
requirements. 

(RL1) The language must support the definition of configuration checks for di- 
verse software components (e.g., network-level firewalls or application- level 
access control systems) and diverse technologies. 

(RL2) The language must be expressive enough to cover new technologies or con- 
figuration formats without requiring extensions. This would avoid the need 
to update the language interpreter every time a new extension is published. 

(RL3) It must be possible to specify target components by defining conditions 
over properties such as name, release, and supported specification, or over 
the existence of relationships between components. This is necessary in cases 
where externally provided checks must be applied to all instances of the 
affected software components (scenarios SI and S2). 

(RL4) Motivated by scenario S3, it must be possible to specify target compo- 
nents by referring to specific instances of a software component. 

(RL5) It must be possible to validate the configurations of different, potentially 
distributed system components within one check. 

(RL6) Checks must be uniquely identifiable, declarative, standardized and cer- 
tifiable, to support trusted knowledge exchange among security tools and 
stakeholders, e.g., software vendors, experts, auditors, or operations staff. 

(RL7) The language must support parametrization in order to adopt externally 
provided checks to a specific configuration policy. 

(RL8) The specification of checks must be separated from the collection of the 
involved configuration settings from a given managed domain. 

3 State of the Art 

Prior art for the definition of the configuration validation language comprises 
several specifications out of the Security Content Automation Protocol (SCAP), 
as well as proprietary languages supported by vulnerability and patch scanners. 

SCAP [If] is a suite of specifications that support automated configuration, 
vulnerability and patch checking, as well as security measurement. Some of the 
specifications arc widely applied in industry, e.g., the Common Vulnerabilities 
and Exposures (CVE, http://cve.mitre.org), and those related to configura- 
tion validation will be discussed with regard to above-described requirements. 
Note that several approaches assess a system's overall security level by analyzing 
and reasoning about the potential combination of individual vulnerabilities (ex- 
ploits) by an adversary [9], [10]. Though referring to SCAP specifications, these 
approaches do not look into the vulnerability specification itself, but use the 
language and related tools merely for the discovery of individual vulnerabilities. 
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Common Platform Enumeration (CPE, http://cpe.mitre.org) is a 

XML-based standard for the specification of structured names for information 
technology systems, software, platforms, and packages. It allows the definition 
of names representing classes of platforms which can be compared in order to 
establish if, e.g., two names are equal or if one of the names represents a subset 
of the systems represented by the other. CPE 2.3, the latest version, consists 
of four modular specifications which work together in layers: (i) CPE Naming 
providing a formal name format, (ii) CPE Language allowing the description of 
complex platforms, (Hi) CPE Matching providing a method for checking names 
against a system, and (iv) CPE Dictionary binding text and tests to a name. 

While the specifications CPE Naming and CPE Matching allow the definition 
and comparison of single software components according to properties such as 
vendor or product name, the CPE Language specification does not meet (RL3) 
with regard to component relations. It supports the specification of a complex 
platform through a logical condition over several CPE Names, but the seman- 
tics of their relationship is not explicitly defined. The typical interpretation used 
in many CVE entries is that a complex platform condition is met as soon as 
all software components are installed on the same machine. This interpretation, 
however, is in many cases not sufficient to state that a vulnerability exists. CVE- 
2003-0042, for instance, is only exploitable if Tomcat actually uses a given JDK 
version, the mere presence of both components on the same system is not suf- 
ficient. This interpretation is even more misleading if vulnerabilities are caused 
by combinations of client-side and server-side components, e.g., CVE-2012-0287. 
A special kind of relationship is the composition of software components, e.g., 
in the case of Java libraries. Today, each vendor of an application that embeds a 
vulnerable library needs to issue a dedicated CVE, as CPE insufficient to detect 
the use of a given library (in an application). 

Open Vulnerability Assessment Language (OVAL, [12]) defines a lan- 
guage for the definition of security tests detecting the presence of vulnerabilities 
or configuration issues on a computer system (machine) . It defines several XML 
schemas: (i) OVAL System Characteristics represent system configuration in- 
formation that is subject to testing, (ii) OVAL Definitions specify conditions 
for the presence of a specified machine state (vulnerability, configuration, patch 
state, etc.), (in) OVAL Results report the assessment result, i.e., the comparison 
of OVAL Definitions and OVAL System Characteristics. 

Since OVAL already fulfills some of the before-mentioned requirements, the 
language proposed in Sect. 4 is to a good extent based on OVAL concepts. 
According to SCAP design goals, the language supports standardized, unam- 
biguous, and exchangeable representations of configuration checks (RL6) as well 
as variables for parametrization (RL7). However, a significant limitation is that 
OVAL checks (like CPE) work on the granularity of machines (computer sys- 
tems). This impacts several other requirements. With regard to (RL1), it is 
difficult, sometimes impossible, to write configuration checks for fine-granular 
system components independently from their software computing environment 
(container), e.g., JWAs. The reason is that generic OVAL objects from the 
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independent schema (e.g., textf ilecontent54_object) are relative to the ma- 
chine's file system, which varies from one Servlet container to the other The 
definition of container-specific objects (e.g., spwebapplication_object for Mi- 
crosoft Sharepoint), on the other hand, restricts the use of checks to dedicated 
environments. Requirement (RL2) is not fulfilled as OVAL requires the extension 
of several schemas to address new software components. This either requires tool 
vendors to constantly update the language interpreter, or leads to a fragmented 
market where tools only support a subset of the language. We believe that the 
broad adoption of OVAL could be reached more easily by the use of generic types 
(RL2), e.g., on the basis of XML, herewith leveraging the fact that it is used 
for many application- level configuration formats. With regard to (RL3), (RL4) 
and (RL5) it is impossible in OVAL to specify a target for checks that look at 
distributed components, since the execution of a set of OVAL definitions and 
their tests are meant to be executed on a single machine. Furthermore, OVAL 
does not clearly separate check logic from the retrieval of the actual configura- 
tion values (RL8), herewith missing to leverage industry efforts in the area of IT 
Service and Application Management (ITSAM). The deployment descriptor of 
a JWA, for instance, can be retrieved by several means and potentially from dif- 
ferent sources (the actual component, or a configuration store with copies). The 
mixture of these concerns makes the work of check authors difficult and error 
prone, as they cannot focus on the check logic (e.g., the session configuration of 
a deployment descriptor), but also care for the retrieval of values, e.g., the iden- 
tification of a file path depending on installation directories and environment 
variables. To allow the separation of these concerns, the check language itself 
must be agnostic to potential configuration sources, the latter being cared for 
by administrators. 

As representative vulnerability and patch scanner, we consider Nessus (http : 
//www. tenable . com/products/nessus), which is a widely adopted tool and 
comes with a proprietary syntax for the definition of so-called audit checks. Orga- 
nizations can cither write custom checks according to this language, or subscribe 
to a commercial feed to receive compliance checks tailored for a variety of stan- 
dards and regulations, e.g., PCI DSS (https : //www.pcisecuritystandards . 
org). Having comparable expressivity, checks written in this proprietary lan- 
guage can be transformed into SCAP content, which is why Nessus and similar 
tools were SCAP-validated by the MITRE. SCAP and Nessus' proprietary lan- 
guage also have in common that they focus on operating systems, which makes 
it difficult to specify checks on a more fine-granular level, i.e., for objects which 
cannot be easily identified relative to the OS: custom items for Windows and 
Unix require, for instance, the specification of file paths which is not necessarily 
possible for JWA or Web services; built-in checks for Unix hide the configuration 
source from the check author, but instead of making the source customizable, it 
is hard-coded (RL9). Checks considering distributed system components are not 
supported at all (RL5). Nessus does also not allow to condition the applicability 
of the check on the basis of component properties (e.g., release level) or compo- 
nent relationships (RL3) but only on the basis of hard-coded keywords such as 
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Fig. 2: Configuration validation language class diagram 

Unix. As a proprietary language, processed only by Nessus, it is not extensible 
by 3rd parties (RL3), nor standardized (RL4). 



4 Configuration Validation Language 

The configuration validation language allows the definition of checks for selected 
software components and addresses the use cases presented in Sect. 2. It includes 
the definition of the checks as well as of their results. This section introduces 
all the concepts used within the language, and defines the extensions we carried 
out over the OVAL standard. We formally define the semantics of the language 
without binding to a specific syntax. Notice that in the definitions we only 
consider the parts of the OVAL standard which are extended by our language. 
As OVAL is XML-based, a straightforward implementation of our formalism is 
an XML serialization. 

Fig. 2 shows the main concepts of the configuration validation language. 
The concepts are organized into three main areas. The Check and Target areas 
concern the definition of the configuration checks and of the affected software 
components, resp., the System area contains elements corresponding to actual 
configurations and components of a managed domain. 

The Check area (top left of Fig. 2) concerns the definition of checks in the 
form of tests comparing an expected and an actual value. This area relies on the 
OVAL standard [12]. The concepts we borrow and extend are shown in Fig. 2 
and prefixed with "OVAL" . In a nutshell, a definition is characterized by an 
arbitrary complex boolean combination of tests and a test defines an evaluation 
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involving an object (possibly containing a set of other objects) and zero or more 
states. As described in Sect. 3, the existing OVAL objects do not fulfill require- 
ments (RL2), and (RL8). To fulfill them, we defined a new test, object, and state, 
generic enough to apply to multiple configurations of multiple software compo- 
nents and independent from the collection mechanisms. The test and state we 
defined are not shown in Fig. 2 as it is the object, XML Config Object, that 
contains the major contributions. The XML Config Object is characterized by 
three attributes: the type denoting a type of configuration relevant for a soft- 
ware component, the schema denoting the format in which the configurations are 
represented, and query expressing how to identify the object within the configu- 
ration. Such object also overcomes the OVAL drawbacks about (RL1) discussed 
in Sect. 3. 

Example 2 (Object, state, and test for http- only flag). The XML Config Object 
can be used to specify the recommendation described in Ex. 1. In the excerpt 
below, type (line 2) indicates that the configuration considered is a deployment 
descriptor (computing environment independent), schema (line 3) refers to the 
location of the schema for the deployment descriptor of J2EE web application 
and the Xpath query (line 4) points to the http-only configuration. 

<xmlconfiguration_object id=" oval: sans. security :obj :1"> 
<type>deployment descriptor</type> 
<schema>http : // j ava . sun . com/xml/ns/j2ee </schema> 

<query>//*session-config/*cookie-config/*http-only/text()</query> 
</xmlconfiguration_object > 

By modifying only the query element, all the other recommendation of Ex. I can 
be specified. Moreover, by modifying also the type and schema, our object can be 
used for any other XML based configuration. The expected value for the configu- 
ration is defined through a xmlconf iguration_state defining true as expected 
value for the http-only tag. Finally, the OVAL test, xmlconf iguration_test, 
contains the object and state above which are used to evaluate the configuration. 

Definition 1 (OVAL Definition). An OVAL Definition OD C T is a set of 

OVAL Tests. 

Example 3 (OVAL Definition for SANS). The OVAL definition checking for the 
SANS recommendations described in Ex. 1 is a set of tests, one for each recom- 
mendation, i.e, OD sans — {tfittp—onlyjt secure — $l a g\. 

According to OVAL, a definition is a boolean combination of tests. As SANS 
requires all recommendations to be followed, all the tests involved are charac- 
terized by an OR boolean relation in order to raise an alarm whenever one of 
the recommendation is not followed, thttp-oniy (line 3) is described in Ex. 2. All 
other tests can be analogously defined. 

<definition id=" oval : sans . security : def : 1" > 
<criteria operator=" OR" > 
<criterion test_ref =" oval : sans . security : tst : 1 " comment =" HttpOnly flag"/> 
<criterion test_ref =" oval : sans . security : tst :2" comment =" Secure flag"/> 
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Table 1: Properties description 



product 


Product name, e.g., Struts 


unc_path 


UNC path for shared location 


vendor 


Product vendor, e.g., Apache 


ctx_root 


JWA context root 


release 


Product release, e.g., 2.3.1.1 


ip_jmx 


IP address of JMX endpoint 


sup_spec, 
req_spec 


Supported/Required specifi- 
cation 


port_jmx 


Port number of JMX endpoint 



Table 2: Relations description 



depl_in 


deployed in, models a 
component installed in 
another 


comp_of 


composed of, represents the inter- 
nal structure of applications (e.g. 
linked libraries) 


comm_with 


communicates with, rep- 
resents network communi- 
cation 


instr_set 


instruction set, for either compiled 
(x86, x64) or interpreted (Java 
Runtime) binaries 



The Target area (top right of Fig. 2) allows the definition of targets for the 
checks. A target definition is an abstract concept representing either a software 
component or a relation which can be defined over software components or rela- 
tions themselves. A software component is characterized by a set of conditions on 
specific properties such as those in Tab. 1 (left side). A relation defines the rela- 
tionship between software components. We distinguish three kinds of relations. A 
static relation, i.e., "composed of", which allows to represent the internal struc- 
ture of a software. Run-time relations, i.e., "deployed in" and "communicates 
with", which allow to define relations among software components running in 
a landscape. Finally, boolean relations (AND, OR) combine either static or dy- 
namic relations. Dynamic and boolean relations can be nested whereas the static 
relation can only be applied to software components. These types of relations, 
combined with the possibility to nest them, allow to define a set of software 
components satisfying an arbitrary complex expression. 

Definition 2 (Software Component). A software component SC C C is a 
set of conditions. A condition C e C is a tuple C = (P,9,V), where 

P £? is a property name, 

- 9 e {=, <, >, >, <} is an operator, 

V e dom(P) is a value for the property. 

We define 1Z as a set of relations. Examples are listed in Tab. 2. We define 
7Z = 1Z x N as the set of numbered relations where any relation can occur an 
arbitrary number of times and is uniquely identified by a natural number. In the 
examples we omit the natural number when no ambiguity arises. 

Definition 3 (Target Definition). A target definition is a tuple TD = (SCS, 
RS,p) where 

- SCS is a set of software components (cf. Def. 2), 
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- RS C TZ is a set of numbered relations, 

- p : RS — > (SCS U RS) x (SCS U RS) is a total and acyclic function mapping a 
relation into the pair of elements, denoted as p\ and P2, sharing the relation 
(either software components or relations). 

A target definition TD = (SCS, RS, p) is valid iff \SCS\ = 1 when RS = 0. 

Example 4 (Software Component and Target Definition for SANS). SANS ap- 
plies to JWAs developed according to one of the releases of the Servlet specifica- 
tion and deployed in a web application container supporting such specification. 
In particular the recommendations in Ex. 1 refer to the release 3.0. According to 
Def. 2, a software component for the web application container can be defined 
as the set containing a single condition referring to the supported specifica- 
tion, SC we bappcont = { (sup_spec, > , Java_Servlet_3.0) } . As the recommendation 
applies to all JWAs therein deployed, the software component for the web ap- 
plication can be specified as an empty set SC webapp = 0. Finally, the target 
definition, according to Def. 3, can be expressed as TD sans = (SCS sans , RS sans , 
Psans) where SCS sans = {SC webapp , SC ' webappcont] 7 RS sans — {depl_in}, and 
PisansiteP 1 - 111 ) = {SC weba pp}, P2 sans (depl_in) = {SC webappcont }. 

We extend the OVAL standard by referring each OVAL definition to a target 
definition, i.e., to a set of related software components, and referring each OVAL 
test contained in the definition to a software component of the target definition. 
Thus we fulfill requirements (RL3) and (RL5). We name the resulting new arti- 
fact check definition. Note that this artifact is not represented by a single class 
in Fig. 2 but it involves several of the concepts therein presented and formalized 
above. Def. 1 and 3 provide the building blocks for the check definition. 

Definition 4 (Check Definition). A check definition is a tuple CD = (OD, 
TD,t) where 

- OD C T is an OVAL definition, 

TD = (SCS , RS , p) is a target definition, 

- t : OD — > SCS is a total function that maps an OVAL test included in the 
definition OD into the software component to which it applies defined for the 
target definition TD. 

Example 5 (Check Definition for SANS). Given OD sa ns 

and TD 

sans defined in 

Ex. 3 and Ex. 4 resp., a check definition for SANS recommendations on cookies 
is CDsans = (ODsans, TD san s, T san s) where r(t) = SC webapp for all t e OD. 

The System area (bottom of Fig. 2) contains the concepts characterizing 
systems in a landscape and their configurations. A system component represents 
a single installation of a software component in a specific domain. As the purpose 
is to identify its configurations, the system component is defined as a set of 
attributes denoting how the configurations can be retrieved. The configurations 
required are given by the OVAL tests which arc defined for software components. 
To evaluate the tests, the objects they contain have to be retrieved for each 
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installation of the software component, i.e., for each system component. The tests 
to be performed on system components arc defined through the test mapping. 
The set of attributes necessary to collect a configuration is given by the collector 
(more details about how system components are derived starting from the target 
definition and the collector can be found in Sect. 5). By allowing the separation 
of the check logic from the attributes needed for the collection, our language 
fulfills requirement (RL8). 

Definition 5 (Collector). A collector is a tuple K — (SC k, PS ,Ok) where 
SCk is a set of conditions, PS C V is a set of properties, and Ok is a query 
over OVAL objects. 

Example 6 (Collector for Web Applications deployment descriptor). A collec- 
tor for web applications deployment descriptor has to define the set of at- 
tributes for retrieving the deployment descriptor of the web application in- 
stalled in the landscape. Several alternatives are viable, e.g., accessing a shared 
file system via the Universal Naming Convention (UNC) or relying on the 
JMX interface of Tomcat. These alternatives can be defined as two collectors, 
K unc = (£(7ic„ e6opp ,{unc4>ath},Ox„ ei)opp ) K jmx = (SC Kwebapp , {ctx_root, 
ip_jmx,port_jmx}, Kwcbapp ) where SC Kvjcbapp = {(req.spec, =, Java_Servlet_3.0)} 
is the same for both as they apply to the same software component, and OK w<lbapp 
is an Xpath query over the XML serialization of the object (omitted for the sake 
of brevity) . 

Definition 6 (System Component). A system component SI C A is a set of 

attributes. An attribute is a tuple A = (P,V), where FeP and V G dom(P) 
are properties and values, resp. 

Example 7 (System, Component for SANS). The check definition for SANS in 
Ex. 5 includes the software component SC we t, app = defined in Ex. 4 which is 
referred to by an XML Conf ig Test. Moreover the web application installed in 
the managed domain of Fig. 1 are characterized by the property of supporting the 
Scrvlet specification 3.0. Thus the collector defined in Ex. 6 can be used for estab- 
lishing the set of attributes of the resulting system components. By using K unc , 
the resulting system component for one installation of the elnvoice web applica- 
tion sold by ACME is SI unc = {(unc.path, \\192. 168.2. 3\path\to\web.xml)}. 
By using Kj mx , the resulting system component is SI j mx — {(ctx_root, 
/manager/*), (ip-jmx, 192.168.2.2), (port_jmx, 8059)}. 

Definition 7 (System Test). A system test is ST = (SIS, OD, TM) where 

- SIS is a set of system components, 

- OD C 7 is an OVAL definition, i.e., a set of tests, 

- TM C OD x SIS is a set of test mappings defining which test of the definition 
applies to which system component. 

Example 8 (System Test for SANS). The check definition CD sans = (OD sans , 
TD sans ,T san s) defined in Ex. 5 originates several system tests, one for each 
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set of software components installed in the managed domain fulfilling the tar- 
get definition TD sans . Given, OD sans = {t http- only, t secure- flag}: TD sans = 
(SCS sans ,RS sans ,p sans ), and r(t) = SC webapp , a system test defining the 
tests to be performed for one possible installation of the software compo- 
nents is STsans = (SISsans, OD sa ns, TM S ans) where SIS sans = {SIj mx }, and 

TMsans = {(thttp-oniy,SI jmx ),(tsecure-fia a ,SI jrnx )}. Notice that no system 
component for SC we b ap pcont is included in SIS sa ns as no tests apply to it. 

The system test refers a test to specific system, thus (RL4) is met. 

Finally, the OVAL Item in Fig. 2 represents the configuration collected from 
a system component for the OVAL object defined in the OVAL test. By evalu- 
ating such items according to the test, a boolean result for the test is produced. 
Based on the test results, the boolean result of the definition is also evaluated. 
Differently from OVAL, our OVAL Items may derive from different system, how- 
ever this does not affect the evaluation algorithm defined in [12], which we rely 
on. A check definition originates several system tests, each one originating a 
check result. 

Definition 8 (Check Result). A check result is a tuple CR = (ST, u) where 

- ST = (SIS, OD, TM) is a system test, 

- u> : TM — > {T,_L} is a function that maps test mappings into its result, i.e., 
the boolean values true (T) or false (±). 

5 Approach 

The language presented in Sect. 4 separates the checks' logic from the systems 
to which they apply. In this section we establish the link between these two 
aspects, thereby describing how the checks can be instantiated and executed in 
a concrete landscape. 

The overall approach is outlined in Fig. 3. External and internal authors 
(from the perspective of an organization) can define, independently from the 
landscape, checks CD (Def. 4) for known vulnerabilities affecting software com- 
ponents (cf. (SI)), and for best practices of single or multiple software com- 
ponents sharing relations (cf. (S2)). An additional input is the set of collector 
definitions /C, that has to be provided by system administrators as creates the 
link between the software components used in the checks and the attributes 
of system components which allow the collection of the configurations. The 
TD Evaluator module has in input the above artifacts and is responsible for pro- 
ducing all the system tests ST defining which test has to be executed on which 
system component. To produce the System Test artifact, the TD Evaluator re- 
lies on a Data Source, an authoritative source of information about the software 
components installed in a managed domain. We assume a single Data Source 
to provide information about several aspects of the managed domain, ranging 
from the properties of installed software (e.g. product names and vendors), or 
the internal structure of applications (e.g. linked libraries), up to architectural 



14 M. M. Casalino, M. Mangili, H. Plate, S. E. Ponta 



TD Evaluator 



System 
Admin 



External/ 

Internal 

Author 



OVAL 
Processor 



Fig. 3: Detection of vulnerabil- 
ities approach. 



details on the deployment or the network interaction among different pieces 
of software. Since such information is often scattered over several repositories 
within an organization (e.g., CMDBs), the Data Source is a federated set of 
views over these repositories, which constitute the interface to our language. 
Although strong, this assumption is not unre- 
alistic. Indeed, several theoretical formulations 
of this problem are tackled in literature on 
data integration [13] [14]. Furthermore the in- 
creasing adoption of standards such as DMTF's 
CMDBf [15] demonstrates the practical feasibil- 
ity of configuration data federation. 

The system test can also be manually pro- 
vided by system administrators in case of checks 
for selected system components (cf. scenario 
(S3)). System tests are then processed by the 
OVAL Processor module that interprets the 
OVAL content and collects the objects defined 
for each system component within ST. The con- 
figurations collected from distributed systems are then evaluated and check re- 
sults CR are produced, highlighting existing misconfiguration issues (if any). 

A key step of the approach is the generation of the system tests based on 
the data source. In the following we formally define the interpretation of target 
definitions w.r.t. a data source, which provides information about the properties 
of software components deployed within a managed domain. We then describe 
how this leads to the generation of system tests. 

Informally a data source can be seen as a particular instantiation of software 
component properties (cf. Def. 2) and target definition relations (cf. Def. 3) for 
a managed domain. Let T be the domain of instances of software components, 
namely software component identifiers, containing one unique symbol for each 
software component installed in a given managed domain. The data source then 
maps every software component identifier to the actual values of its properties 
and links it to the other software component identifiers it is related to. 

Definition 9 (Data Source). A data source is the pair of sets DS = (77, 7 1 ). 
II contains a partial function ixp :X — > dom(P) for each property P £ V, while 
r includes a relation jr C I x X for each symbol R € 1Z. 

Example 9 (Data Source). Figure 4 depicts a tabular representation of the data 
source DS\ for the example landscape of Fig. 1. Due to space limitations, only a 
subset of the properties listed in Tab. 1 and relations of Tab. 2 are considered. 

A software component can be seen as a simple conjunctive query ranging 
over properties of software deployed within a managed domain. The data source 
provides the necessary views on the managed domain to answer such a query. 
The answer consists of the set of software component identifiers matching to all 
the conditions within the software component. If it contains no conditions, the 
answer is the entire domain of software component identifiers T. This evaluation 
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Fig. 4: Example of data source instance, 
is performed by the data source interpretation of software components, given by 
the mapping ["•] DS : SC — > 2 1 : 



\(P,e,v)USC] DS = {i€l\ ttp(z) 9v}D \SC] DS . 



(1) 



A target definition TD = (SCS, RS, t) is instead a more complex selection pred- 
icate (cf. Def. 3) and there can be several sets of software component identifiers 
which satisfy it. The interpretation of TD over a data source DS, fTDjos, pro- 
vides all such sets. This is done by relying on two interpretation functions, one 
providing the sets of software component identifiers, and one providing a func- 
tion that maps each software component identifier to the corresponding software 
component. 

The interpretation function [[•]] r>s,p '■ (SC U TZ) — > 2 2 associates every SC € 
SC and R G 1Z to a powerset of software component identifiers, as defined in (2) 
and (3) , respectively. Notice that this function depends both on the data source 
DS and the function p that carries the structure of target definition expressions. 



DS,p 



(2) 

if R = A 
if R = V 



{\SC] DS } 

yi(RnDs, P xy 2 (Rn DS , P 

{{vi,...,V n ,W 1 ,...,W m } I {vi,...,V n } G n>l(-R)ll DS,p, 

{tui, . . . , w m } G [fp2(-R)HDS lP , («i<»<n,«'i<j<m) € 7k} otherwise 

(3) 

Similarly, the interpretation function |[-JJ ds,p : (SC U TZ) — > (I — > <SC) maps 
every SC € SC and i? G 7£ to a function cr associating each software component 
identifier to the corresponding software component, according to (4) and (5). 

ISC^ds.p = <t, where a(i) = SC, Vz G \SC] DS (4) 
IR^ds.p = <r, where a(i) = lpi(R)]\ D s, P (i),Vi G dom([[ / 9 1 ( J R)JJ r)S : i(9 ) 

and = lp 2 (i?)^s, P (j), Vj G dom(^ 2 (i?)| DS , P ). (5) 

Finally, the evaluation function for a valid target definition TD = (SCS, RS, 
p) over the data source DS, \-\ds ■ TV -> ^2 2X x (I -> SC)^ , associates a TD 
to the pair (I* , a) , where I* is a powerset of software component identifiers and 
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a a function mapping every i e I g I* to a 5C g SCS. As expressed in (6), the 
definition of 1-\ds relies on the aforementioned recursive interpretation functions 
of all the elements within the target definition expression, starting, in the general 
case, from the only relation i? which never appears in the p co-domain. In case 
RS = 0, we know from Def. 3 that 3! SC g SCS and therefore SCq is the only 
element being interpreted. 

W 7"T)H - / (Mds.p, Wds, p ), with {i? } = dom(p) \ cod(p) if i?S ^ 
l^Jlus I (([SCoUcs,,, U.5CoJ| ds,p), with {SC } = SCS otherwise. 

(6) 

Example 10. We hereby compute the interpretation of the target definition 
TD sans , introduced in Ex. 4, w.r.t. the data source DSi, shown in Ex. 9. 

First, we recognize (Eq. (6)) that lTD sans j DSl = (I* ans ,a sans ) = (([depl-inHus^, 
[[depl_in]]£>s lj(9 ), since depl_in g dom(p) \ cod(p). 

In order to obtain [fdepl_in]l£)Si,pj according to (3), we now need to compute 
the two following terms: 

(i) [fp^depl-in)]!^,^ = {\SC webapp ] DSi p } = {\<D] DSl } = {I}; 

(ii) [[P2(depl_in)]l£)5 li(9 = {\ SC we b appcont \ DSip \ = 
= [{(sup_spec, >, Java_Servlet_3.0)}] £)5i = 

= {{i g I | 7r sup _ spec («) > Java_Servlet_3.0}} = * 2 }}- 

We then have I* ans = [[depl_in|] = {{v,w} \ v g X,w g {t 1 ,t 2 },(v, 

w) g {(W a ,tl),(w b ,t 2 ),(w c ,t2)}} = {{Wa,ti} 1 {w b ,t 2 },{w c ,t 2 }}. 

Analogously, by applying (5), we obtain a sans = [[depl_inJJ D 5 li(9 = {w a : 

SC webappi '. SC webapp: • SC we b a pp,ti . SC webappconti ^2 : SC webappcontj • 

As last step, the TD Evaluator needs to identify one or more system tests, 
mapping each OVAL test to the system component carrying the information 
about how to collect the object. 

A check definition CD = (OD, TD, r) is defined for the target definition TD, 
being interpreted over a data source resulting in a pair [T£>]c5 = (/*, a). Every 
I g I* is a set of software component identifiers satisfying the TD expression. 
Therefore one system test has to be created for every such set /. 

When the TD Evaluator processes a check definition, it must identify a 
matching collector K, among the set /C of all the ones defined for a given man- 
aged domain. This has to be done for every software component identifier i g 7, 
and provides the set of properties PS necessary to collect the to-be-checked con- 
figurations for specific OVAL Objects from i. For this reason, every K g K (cf. 
Def. 5) references a software component SCk and contains a Xpath query Ok, 
matching to the XML serialization of the OVAL Objects it applies to. We write 
t \= Ok whenever the XML serialization of all the OVAL Objects referenced 
within an OVAL Test t satisfy the Xpath query Ok- 

Given a collector property set PS and a software component identifier i, 
Eq. (7) defines how to retrieve the corresponding system component from a data 
source DS, through the interpretation function IHI^^) : 2 V — > ST. 
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ll^|| DS (i) = || {Pi, . . . , P„}|| DS>i - {(Pi,7r Pl (i)), . . . , (P n , 7TP B (»)»■ (7) 

The conditions required to determine whether a collector matches to a soft- 
ware component identifier are now formalized by the following definition. 

Definition 10 (Matching Collector). For a CD = (OD, TD,t), where TD = 
(SCS,RS,p), let [TD]u5 = (I*, a) be an interpretation of TD over DS and 
r _1 : SCS — > 2 OD be the inverse of t, mapping every SC to the set {t G OD \ 
r(t) = SC}. We then say that K = ( SC K , PS, K ) matches to i G I G I* , iff 

i G \SC K ] DS and P G PS => 3(P,-) G \\PS\\ DS (i) and t G T~V(i)) t |= Ox- 
Given the interpretation [T/JJ/js = (I*,cr) of a target definition within a 
check definition CD = (OD, TD, r), we are now in a position to associate each 
I G /* to a system test 5T 7 = (SIS'/, OD, TM j), constructed as follows, (i) OD 
is the same OVAL Definition contained in CD. (ii) Every element SI G 5/5/ is a 
system component, i.e. a collection of attributes associated to properties of the 
software component identifier which allows to collect configuration information 
from it. For every i G / we first need to find a matching collector K carrying 
such set of properties PS, and we then retrieve the system component 5/, i.e. 
the attributes corresponding to the properties in PS, from the data source DS. 
(iii) TMi maps every test t G OD to a system component SI G 5/5/. 

Eq. (8) finally specifies how the system test's components 5/5/ and TMi, 
informally described above, are built by the TD Evaluator. 

Mi e / if 3K G JC s.t. K matches to i, then 
\\PS\\ DS {i) G 5/5/ and (t, \\PS\\ DS .) G TMj Vt G r^H*)). 

Example 11. Let us consider the check definition CD sans = (OD sans , TD sans , 
Tsans), introduced in Ex. 5, and the data source interpretation of its target 
definition [[T-DsanslnSi = (I* S ansi a sans), which has been derived in Ex. 10. 
Three sets of software component identifiers satisfy the target definition, namely 
4 ns = {{wa,ti},{w b ,t 2 },{w c ,t 2 }} = {l a ,I b ,I c }, hence three system tests will 
be created. Among those, we shall only discuss, for brevity, the system tests 
5T/ a and STj b , related to I a and I b resp. 

For the sake of this example we extend the data source DS\ = (J7i, A) such 
that it includes the properties required by the collectors (cf. Ex. 6). Let such an 
extended data source be DS[ = (TIi U {7r ctx _ root , 7r lp _ jmx , 7r port _ jmx , 7r unc _ path , }, r), 
where: 7r ctx _ root (w Q ) = /manager/*, 7r ip _ jmx (w a ) = 192.168.2.2, 7r port _ jmx (w; a ) = 
8059, and 7T unc _ path («; 6 ) = \\192. 168.2. 3\path\to\web.xml. 

According to Def. 10 the collector Kj mx matches to the software component 
identifier w a (and not to w b ), as (i) w a G \SC Kwabapp \ DS , (ii) 7i"ctx_root(wa), 
7tport_jmx(w a ), 7r port _j mx (w a ) are all defined in DS (while this is not the case for 
w b ), and (iii) both t M t P — only h Kmebapp and t s 

ecure — flag h Kvaebapp bold. From 
analogous reasoning it follows that K unc matches to uu b (and not to w a ). 

By applying (8) we finally derive that ST Ia = ({SI jmx }, OD sans ,{(t htt p-only, 
SI jmx), {t secure- flag, SI j mx )}) = ST sans , as anticipated in Ex. 7 and 8. Anal- 
ogously, we Obtain ST Ib = ({SI unc },OD sans , {(thttp-only, SI unc ), [t secure- flag, 
SI U nc)}) ■ 
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6 Conclusion and Future Work 



This paper presents a formal approach to specify and execute declarative and 
unambiguous checks able to detect vulnerabilities caused by system misconfig- 
uration. This paper extends the state of the art on configuration validation as 
security checks can be specified for fine-granular components in a distributed 
environment and separate the check logic from the configuration retrieval. 

A proof of concept has been developed to explore the feasibility of our ap- 
proach at the example of OWASP and SANS recommendations for JWA, using 
a CMDB as data source for resolving target definitions, and JMX for the col- 
lection of configuration settings. In future work, we will evaluate the prototype 
in near-world environments that comprise a greater numbers of system com- 
ponents. Furthermore, we plan to generate security checks and checklists in an 
automated fashion to facilitate scenario (S3), where checks are used for gain- 
ing assurance about compliance with system-specific configuration policies. This 
would allow to gain assurance without the need to manually author check on a 
low technical level. Lastly, we intent to investigate the usage in cloud scenarios, 
were cloud providers could use and offer a corresponding tool for ensuring the 
security of consumer-managed resources. 

References 

1. 7Safe, the University of Bedfordshire: Uk security breach investigations report 
2010. http : //www . 7saf e . com/breach_report/Breach_report_2010 .pdf (2010) 

2. Verizon: 2009 data breach investigations report. Verizon, http://www.7safe.com/ 
breach_report/Breach_report_2010 . pdf (2009) 

3. Williams, J., Wichers, D.: Top 10 most critical web application security risks. 
OWASP, https://www.owasp.org/index.php/Top_10_2010-A6 (2010) 

4. http : / /{scap , usgcb , nvd} . nist . gov 

5. http : //tomcat. apache. org/security — 6. html#Fixed_inJVpache_Tomcat_6. 0.35 

6. https : //www . owasp . org/ index . php/Secur ing_tomcat 

7. http : //software- security . sans . org/blog/2010/08/11/ 
security-misconf igur at ions- java-webxml- files 

8. http : //tomcat . apache . org/ connect or s-doc/generic_howto/proxy .html 

9. Chen, X., Zheng, Q., Guan, X.: An OVAL-based active vulnerability assessment 
system for enterprise computer networks. ISF (2008) 573-588 

10. Ou, X., Govindavajhala, S., Appel, A.W.: MulVal: a logic-based network security 
analyzer. In: USENIX Security Symposium. (2005) 

11. Waltermire, D., Quinn, S., Scarfone, K.: The technical specification for the Security 
Content Automation Protocol (SCAP): SCAP version 1.1. NIST, http://csrc. 
nist.gov/publications/nistpubs/800-126-revl/SP800-126rl.pdf (2011) 

12. J. Baker, M. Hansbury, D.H.: The OVAL language specification (version 5.10.1). 
MITRE Corporation, http: //oval .mitre . org/language/version5 . 10 . 1/0VAL_ 
Language_Specification_01-20-2012.pdf (2012) 

13. Ullman, J.D.: Information integration using logical views, ICDT, (1997) 19-40 

14. Lenzerini, M.: Data integration: a theoretical perspective. PODS (2002) 233-246 

15. DMTF Distributed Management Task Force: Configuration Management Database 
(CMDB) Federation Specification. DMTF Technical Report DSP0252. (2010) 



